Search CORE

12 research outputs found

Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource

Author: Daelemans Walter
Emmery Chris
Tulkens Stéphan
Publication venue
Publication date: 01/01/2016
Field of study

Word embeddings have recently seen a strong increase in interest as a result of strong performance gains on a variety of tasks. However, most of this research also underlined the importance of benchmark datasets, and the difficulty of constructing these for a variety of language-specific tasks. Still, many of the datasets used in these tasks could prove to be fruitful linguistic resources, allowing for unique observations into language use and variability. In this paper we demonstrate the performance of multiple types of embeddings, created with both count and prediction-based architectures on a variety of corpora, in two language-specific tasks: relation evaluation, and dialect identification. For the latter, we compare unsupervised methods with a traditional, hand-crafted dictionary. With this research, we provide the embeddings themselves, the relation evaluation task benchmark for use in further research, and demonstrate how the benchmarked embeddings prove a useful unsupervised linguistic resource, effectively used in a downstream task.Comment: in LREC 201

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen

Tilburg University Repository

Using Distributed Representations to Disambiguate Biomedical and Clinical Concepts

Author: Daelemans Walter
Tulkens Stéphan
Šuster Simon
Publication venue
Publication date: 01/01/2016
Field of study

In this paper, we report a knowledge-based method for Word Sense Disambiguation in the domains of biomedical and clinical text. We combine word representations created on large corpora with a small number of definitions from the UMLS to create concept representations, which we then compare to representations of the context of ambiguous terms. Using no relational information, we obtain comparable performance to previous approaches on the MSH-WSD dataset, which is a well-known dataset in the biomedical domain. Additionally, our method is fast and easy to set up and extend to other domains. Supplementary materials, including source code, can be found at https: //github.com/clips/yarnComment: 6 pages, 1 figure, presented at the 15th Workshop on Biomedical Natural Language Processing, Berlin 201

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen

A Short Review of Ethical Challenges in Clinical Natural Language Processing

Author: Daelemans Walter
Tulkens Stéphan
Šuster Simon
Publication venue
Publication date: 01/01/2017
Field of study

Clinical NLP has an immense potential in contributing to how clinical practice will be revolutionized by the advent of large scale processing of clinical records. However, this potential has remained largely untapped due to slow progress primarily caused by strict data access policies for researchers. In this paper, we discuss the concern for privacy and the measures it entails. We also suggest sources of less sensitive data. Finally, we draw attention to biases that can compromise the validity of empirical research and lead to socially harmful applications.Comment: First Workshop on Ethics in Natural Language Processing (EACL'17

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen

Embarrassingly Simple Unsupervised Aspect Extraction

Author: Tulkens Stéphan
van Cranenburgh Andreas
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

University of Groningen

Embarrassingly Simple Unsupervised Aspect Extraction

Author: Tulkens Stéphan
van Cranenburgh Andreas
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Proceedings - University of Groningen

The produsing expert consumer : co-constructing, resisting and accepting health-related claims on social media in response to an infotainment show about food and nutrition

Author: Declercq Jana
Tulkens Stéphan
Van Leuven Sarah
Publication venue: 'SAGE Publications'
Publication date: 01/01/2019
Field of study

This article examines the Twitter and Facebook uptake of health messages from an infotainment TV show on food, as broadcasted on Belgium’s Dutch-language public broadcaster. The interest in and amount of health-related media coverage is rising, and this media coverage is an important source of information for laypeople, and impacts their health behaviours and therapy compliance. However, the role of the audience has also changed; consumers of media content increasingly are produsers, and, in the case of health, expert consumers. To explore how current audiences react to health claims, we have conducted a quantitative and qualitative content analysis of Twitter and Facebook reactions to an infotainment show about food and nutrition. We examine (1) to which elements in the show the audience reacts, to gain insight in the traction the nutrition-related content generates and (2) whether audience members are accepting or resisting the health information in the show. Our findings show that the information on health and production elicit the most reactions, and that health information incites a lot of refutation, low acceptance and a lot of suggestions on new information or new angles to complement the show’s information

Ghent University Academic Bibliography

Institutional Repository Universiteit Antwerpen

Embarrassingly Simple Unsupervised Aspect Extraction

Author: Tulkens Stéphan
van Cranenburgh Andreas
Publication venue
Publication date: 01/01/2020
Field of study

We present a simple but effective method for aspect identification in sentiment analysis. Our unsupervised method only requires word embeddings and a POS tagger, and is therefore straightforward to apply to new domains and languages. We introduce Contrastive Attention (CAt), a novel single-head attention mechanism based on an RBF kernel, which gives a considerable boost in performance and makes the model interpretable. Previous work relied on syntactic features and complex neural models. We show that given the simplicity of current benchmark datasets for aspect extraction, such complex models are not needed. The code to reproduce the experiments reported in this paper is available at https://github.com/clips/catComment: Accepted as ACL 2020 short pape

arXiv.org e-Print Archive

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Institutional Repository Universiteit Antwerpen

Dissertations of the University of Groningen